# Codebooks for: "The Causal Effects of Political Incivility in Social Media Discussions"

This document provides descriptions of all variables in all data sets used in the paper. Documentation for the 2024 ANES Time Series Study is included separately in the replication file. 

---

## all-bot-comments-2025-10-12

**Platform Data** | N = 20,206 rows | 5 columns

Synthetic user (bot) comments delivered to participants on the Spark Social platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,515+ unique codes (e.g., "tgyf8q") | 0% | Identifier; links survey and platform data |
| postId | character | ID of the post being commented on | 471+ unique IDs | 0% | Identifier; links to posts |
| text | character | Text content of bot comment | 20,206 unique values | 0% | Text field; GPT-generated content |
| bot_number | numeric | Bot identifier number | min=1, max=22 | 0% | Bot index (22 total bots) |

---

## all-comments-2025-10-12

**Platform Data** | N = 4,440 rows | 19 columns

Participant-generated comments on the Spark Social platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,161+ unique codes | 0% | Identifier; links survey and platform data  |
| postId | character | Unique identifier of commented post | 459+ unique IDs | 0% | Identifier; links to posts |
| text | character | Comment text content | 4,170+ unique values | 0.99%; due to removal of emojis in pre-processing | Text field; user-generated |
| chars.comments | numeric | Character count | min=1, median=58, mean=79, max=829 | 0.88% | Derived metric |
| tokens.comments | numeric | Token count | min=0, median=11, mean=14.9, max=156 | 0.88% | Derived metric |
| tox | numeric | Toxicity score (Perspective API) | min=0.003, median=0.02, mean=0.06, max=0.94 | 1.89% | Outcome; 0-1 scale |
| vader | numeric | VADER sentiment score | min=-0.975, median=0.128, mean=0.17, max=0.991 | 0.99% | Outcome; -1 to 1 scale |
| identity.attack | numeric | Identity attack score (Perspective API) | min=0.0004, median=0.003, mean=0.008, max=0.57 | 1.89% | Outcome; Perspective API |
| insult | numeric | Insult score (Perspective API) | min=0.006, median=0.009, mean=0.035, max=0.87 | 1.89% | Outcome; Perspective API |
| profanity | numeric | Profanity score | min=0.009, median=0.014, mean=0.033, max=0.90 | 1.89% | Outcome; Perspective API |
| politeness | numeric | Politeness score (standardized) | min=-0.999, median=-0.054, mean~0, max=2.53 | 0.88% | Outcome; standardized |
| prepop.post | character | Pre-populated post text (if applicable) | Civil/Uncivil/Filler/Bonus posts | 21.08% | Treatment stimuli text |
| post.type | character | Type of post commented on | Civil, Uncivil, Filler, Bonus | 21.08% | Treatment indicator |
| post.party | character | Party of post author | Republican, Democratic, None | 21.08% | Stimulus attribute |
| relativeTime_sec | numeric | Seconds since newsfeed start | min=11.8, median=366.3, mean=373.6, max=809.5 | 0% | Derived timing metric |
| relativeTime_min | numeric | Minutes since newsfeed start | min=0.20, median=6.1, mean=6.2, max=12 | 0% | Derived timing metric |

---

## all-posts-2025-10-12

**Platform Data** | N = 964 rows | 13 columns

Participant-generated posts on the Spark Social platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 807+ unique codes | 0% | Identifier; links survey and platform data  |
| postId | character | Unique post identifier | 964 unique IDs | 0% | Identifier |
| text | character | Post text content | 940+ unique values | 0.21%l due to removal of emojis in pre-processing | Text field; user-generated |
| relativeTime | numeric | Seconds since newsfeed start | min=150, median=707.5, mean=746.7, max=4000 | 0% | Max=4000 outlier |
| chars.posts | numeric | Character count | min=1, median=74, mean=105, max=897 | 0.21% | Derived metric |
| tokens.posts | numeric | Token count | min=0, median=14, mean=19.6, max=172 | 0.21% | Derived metric |
| tox | numeric | Toxicity score (Perspective API) | min=0.005, median=0.025, mean=0.069, max=0.79 | 1.35% | Outcome |
| vader | numeric | VADER sentiment score | min=-0.964, median=0, mean=0.15, max=0.97 | 0.21% | Outcome; -1 to 1 scale |
| identity.attack | numeric | Identity attack score (Perspective API) | min=0.001, median=0.004, mean=0.011, max=0.51 | 1.35% | Outcome |
| insult | numeric | Insult score (Perspective API) | min=0.006, median=0.011, mean=0.039, max=0.79 | 1.35% | Outcome |
| profanity | numeric | Profanity score (Perspective API) | min=0.009, median=0.015, mean=0.035, max=0.86 | 1.35% | Outcome |
| politeness | numeric | Politeness score | min=-0.88, median=-0.047, mean~0, max=1.34 | 0.21% | Outcome |

---

## civility-intake-attrition

**Survey Data** | N = 1,652 rows | 10 columns

Intake/eligibility survey data for attrition analysis.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,593+ unique codes | 3.57% | Identifier; links survey and platform data  |
| anon_id | character | Anonymized Prolific ID | 1,652 unique hashed IDs | 0% | Identifier; links to Prolific |
| consent | character | Consent to participate | "1.0" (consented), "0.0" (not) | 0% | Eligibility criteria |
| platforms.cc | numeric | ChatterChain usage (fake platform) | min=1, median=1, mean=1.03, max=3 | 0.18% | Attention check; failure = survey terminated |
| device | numeric | Device type | min=3, median=3, mean=3.4, max=5 | 2.66% | Eligibility criteria; 3=iPhone, 4=Android, 5=something else |
| download.apple | numeric | Willing to download iOS app | 1=yes, 2=no | 41.89% | Missing for Android users | Eligibility criteria
| download.android | numeric | Willing to download Android app | 1=yes, 2=no | 61.08% | Missing for iPhone users | Eligibility criteria
| screen.image | character | Image attention check response | "3.0" (correct) | 3.39% | Attention check; failure = survey terminated |
| age | numeric | Participant age | min=3, median=35, mean=36.5, max=78 | 2.60% | Eligibility criteria of 18+ |
| state | numeric | U.S. state of residence | (1) valid US state, (0) does not reside in US | 2.54% | Eligibility criteria of residing in US |

---

## civility-pre-attrition

**Survey Data** | N = 1,522 rows | 2 columns

Pre-treatment survey completion tracking for attrition analysis.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,522 unique codes | 0% | Identifier; links survey and platform data  |
| pre.progress | character | Pre-treatment survey progress | "100.0" (complete), "9.0"-"95.0" (partial) | 0% | Completion indicator |

---

## civility-post-attrition

**Survey Data** | N = 1,483 rows | 2 columns

Post-treatment survey completion tracking for attrition analysis.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,483 unique codes | 0% | Identifier; links survey and platform data  |
| post.progress | character | Post-treatment survey progress | "100" (complete), "34"-"94" (partial) | 0% | Completion indicator |

---

## civility-full-ajps

**Survey Data** | N = 1,461 rows | 140 columns

Main analysis dataset with survey responses and aggregated platform activity.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| participant_code | character | Unique participant identifier | 1,461 unique codes | 0% | Identifier; links survey and platform data  |
| condition | character | Treatment condition | civil (734), uncivil (727) | 0% | Treatment assignment |
| condition.uncivil | numeric | Uncivil condition indicator | 0=civil, 1=uncivil | 0% | Treatment indicator |
| age | numeric | Participant age | min=18, median=35, mean=36.4, max=78 | 0% | Demographic covariate |
| age.under40 | numeric | Age < 40 indicator | 0/1 | 0% | Derived |
| age.40orover | numeric | Age >= 40 indicator | 0/1 | 0% | Derived |
| device | character | Device type | Apple iPhone (861), Android (600) | 0% | Covariate |
| download.apple | numeric | iOS download willingness | 1 (all yes) | 41.07% | Missing = Android users |
| download.android | numeric | Android download willingness | 1 (all yes) | 58.93% | Missing = iPhone users |
| educ | numeric | Education level | min=1, median=4, mean=4.0, max=6 | 0% | Demographic; ordinal |
| educ.less | numeric | < HS indicator | 0/1 | 0% | Derived |
| educ.hs | numeric | HS diploma indicator | 0/1 | 0% | Derived |
| educ.somecol | numeric | Some college indicator | 0/1 | 0% | Derived |
| educ.college | numeric | College+ indicator | 0/1 (mean=0.58) | 0% | Derived |
| educ4 | character | Education (4 categories) | Less than HS, HS, Some college, College | 0% | Demographic |
| race | character | Race (raw) | Multiple codes (e.g., "4", "3,4") | 0% | Multi-select; needs recoding |
| race.asian | numeric | Asian indicator | 0/1 | 0% | Derived |
| race.black | numeric | Black indicator | 0/1 | 0% | Derived |
| race.hispanic | numeric | Hispanic indicator | 0/1 | 0% | Derived |
| race.white | numeric | White indicator | 0/1 | 0% | Derived |
| race.other | numeric | Other race indicator | 0/1 | 0% | Derived |
| race6 | character | Race (6 categories) | White, Black, Hispanic, Asian, Mixed/Other | 0% | Demographic |
| race6.asian | numeric | Asian (exclusive) | 0/1 | 0% | Derived |
| race6.black | numeric | Black (exclusive) | 0/1 | 0% | Derived |
| race6.hispanic | numeric | Hispanic (exclusive) | 0/1 | 0% | Derived |
| race6.white | numeric | White (exclusive) | 0/1 | 0% | Derived |
| race6.other | numeric | Mixed/Other (exclusive) | 0/1 | 0% | Derived |
| gender | numeric | Gender (raw) | 1-3 | 0% | 1=Man, 2=Woman, 3=Other |
| female | character | Gender (binary) | Man, Woman | 1.57% | Missing = "Something else" |
| female.numeric | numeric | Female indicator | 0/1 | 1.57% | Derived |
| gender3 | character | Gender (3 categories) | Man, Woman, Something else | 0% | Demographic |
| pid3 | character | Party ID (3-way) | Democratic, Republican, Pure Independent | 0% | Covariate |
| pid7 | numeric | Party ID (7-point) | min=1, median=4, mean=3.8, max=7 | 0% | 1=Strong Dem, 7=Strong Rep |
| pid4 | character | Party ID (4-way) | Democratic, Republican, Independent, Other | 0% | Covariate |
| pid.dem | numeric | Democrat indicator | 0/1 | 0% | Derived |
| pid.rep | numeric | Republican indicator | 0/1 | 0% | Derived |
| pid.ind | numeric | Independent indicator | 0/1 | 0% | Derived |
| pid.dem.rep | character | Party (Dem/Rep only) | Democratic, Republican | 9.86% | Missing = Independents |
| party.strength | numeric | Party strength | min=0, median=2, mean=1.9, max=3 | 0% | Derived |
| pol.int | numeric | Political interest | min=1, median=3, mean=3.3, max=5 | 0% | Covariate |
| visit.fb | numeric | Visited Facebook | 0/1 | 0% | Social media use |
| visit.x | numeric | Visited X/Twitter | 0/1 | 0% | Social media use |
| visit.yt | numeric | Visited YouTube | 0/1 | 0% | Social media use |
| visit.tiktok | numeric | Visited TikTok | 0/1 | 0% | Social media use |
| visit.insta | numeric | Visited Instagram | 0/1 | 0% | Social media use |
| visit.truth | numeric | Visited Truth Social | 0/1 | 0% | Social media use |
| visit.snap | numeric | Visited Snapchat | 0/1 | 0% | Social media use |
| visit.reddit | numeric | Visited Reddit | 0/1 | 0% | Social media use |
| visit.parlor | numeric | Visited Parler | 0/1 | 0% | Social media use |
| visit.cc | numeric | Visited ChatterChain | 0 (all) | 0% | Attention check passed |
| sm.count | numeric | SM platforms visited | min=0, median=5, mean=5.3, max=7 | 0% | Derived |
| screen.image | character | Image attention check | "3.0" (all correct) | 0% | All passed |
| comfort.sharing | numeric | Comfort sharing opinions | min=1, median=3, mean=3.2, max=5 | 0% | Covariate |
| pre.climplc.dem | numeric | Pre: Dem climate placement | min=1, median=6, mean=5.8, max=7 | 0% | Pre-treatment outcome |
| pre.climplc.rep | numeric | Pre: Rep climate placement | min=1, median=2, mean=2.7, max=7 | 0% | Pre-treatment outcome |
| post.climplc.dem | numeric | Post: Dem climate placement | min=1, median=6, mean=5.9, max=7 | 0.14% | Post-treatment outcome |
| post.climplc.rep | numeric | Post: Rep climate placement | min=1, median=2, mean=2.5, max=7 | 0.14% | Post-treatment outcome |
| pre.clim.polar | numeric | Pre: Climate polarization | min=0, median=4, mean=3.4, max=6 | 0% | Derived outcome |
| post.clim.polar | numeric | Post: Climate polarization | min=0, median=4, mean=3.7, max=6 | 0.27% | Derived outcome |
| diff.clim.polar | numeric | Change in climate polar. | min=-6, median=0, mean=0.29, max=6 | 0.27% | Derived outcome |
| pre.therm.dem | numeric | Pre: Dem feeling therm | min=0, median=50, mean=51.8, max=100 | 0% | Pre-treatment outcome |
| pre.therm.rep | numeric | Pre: Rep feeling therm | min=0, median=45, mean=42.7, max=100 | 0% | Pre-treatment outcome |
| pre.therm.inparty | numeric | Pre: In-party therm | min=0, median=70, mean=66.8, max=100 | 9.86% | Missing = Independents |
| pre.therm.outparty | numeric | Pre: Out-party therm | min=0, median=30, mean=29.2, max=100 | 9.86% | Missing = Independents |
| pre.aff.pol | numeric | Pre: Affective polarization | min=-100, median=40, mean=37.6, max=100 | 9.86% | Derived; inparty - outparty |
| post.therm.dem | numeric | Post: Dem feeling therm | min=0, median=50, mean=51.6, max=100 | 0% | Post-treatment outcome |
| post.therm.rep | numeric | Post: Rep feeling therm | min=0, median=40, mean=42.3, max=100 | 0% | Post-treatment outcome |
| post.therm.inparty | numeric | Post: In-party therm | min=0, median=70, mean=66.4, max=100 | 9.86% | Post-treatment outcome |
| post.therm.outparty | numeric | Post: Out-party therm | min=0, median=30, mean=29.0, max=100 | 9.86% | Post-treatment outcome |
| post.aff.pol | numeric | Post: Affective polarization | min=-90, median=40, mean=37.4, max=100 | 9.86% | Derived outcome |
| diff.therm.inparty | numeric | Change in in-party therm | min=-80, median=0, mean=-0.35, max=100 | 9.86% | Derived outcome |
| diff.therm.outparty | numeric | Change in out-party therm | min=-100, median=0, mean=-0.18, max=75 | 9.86% | Derived outcome |
| diff.aff.pol | numeric | Change in affective polar. | min=-140, median=0, mean=-0.17, max=200 | 9.86% | Derived outcome |
| pre.trust | numeric | Pre: Political trust | min=1, median=2, mean=2.15, max=5 | 0.07% | Pre-treatment outcome |
| post.trust | numeric | Post: Political trust | min=1, median=2, mean=2.15, max=4 | 0% | Post-treatment outcome |
| diff.trust | numeric | Change in trust | min=-3, median=0, mean~0, max=2 | 0.07% | Derived outcome |
| pre.democ2 | numeric | Pre: Satisfaction w/ democracy | min=1, median=3, mean=2.8, max=7 | 0% | Pre-treatment outcome |
| post.democ2 | numeric | Post: Satisfaction w/ democracy | min=1, median=3, mean=2.9, max=7 | 0% | Post-treatment outcome |
| diff.democ | numeric | Change in democ satisfaction | min=-4, median=0, mean=0.07, max=6 | 0% | Derived outcome |
| bots | numeric | Perceived bot presence | min=1, median=2, mean=2.2, max=4 | 0.07% | Manipulation check |
| manip | numeric | Manipulation check response | 1-2 | 0% | 1=civil, 2=uncivil |
| manip.correct | numeric | Manipulation check correct | 0/1 (mean=0.69) | 0% | Manipulation check |
| dq.flag.speed | numeric | DQ: Speed flag | 0/1 (mean=0.002) | 0% | Data quality flag |
| dq.flag.zip | numeric | DQ: ZIP code flag | 0/1 (mean=0.005) | 0% | Data quality flag |
| dq.flag.oe | numeric | DQ: Open-end flag | 0/1 (mean=0.003) | 0% | Data quality flag |
| dq.flags | numeric | DQ: Any flag | 0/1 (mean=0.01) | 0% | Combined DQ indicator |
| dq.removal | numeric | DQ: Removed | 0 (all) | 0% | None removed in final sample |
| n.comments | numeric | Total comments made | min=0, median=2, mean=3.0, max=22 | 0% | Behavioral outcome |
| mean.chars.comments | numeric | Mean chars per comment | min=0, median=51.6, mean=64.3, max=405 | 0% | Derived |
| mean.tokens.comments | numeric | Mean tokens per comment | min=0, median=9.75, mean=12.0, max=73.7 | 0% | Derived |
| sum.chars.comments | numeric | Total chars in comments | min=0, median=145, mean=232.8, max=2281 | 0% | Derived |
| sum.tokens.comments | numeric | Total tokens in comments | min=0, median=28, mean=43.8, max=425 | 0% | Derived |
| tox.comments | numeric | Mean toxicity of comments | min=0.005, median=0.026, mean=0.054, max=0.74 | 20.88% | Behavioral outcome |
| tox.comments.outparty | numeric | Toxicity on out-party posts | min=0.003, median=0.021, mean=0.054, max=0.75 | 55.44% | Behavioral outcome |
| identity.attack.comments | numeric | Mean identity attack | min=0.001, median=0.004, mean=0.007, max=0.21 | 20.88% | Behavioral outcome |
| insult.comments | numeric | Mean insult score | min=0.006, median=0.011, mean=0.030, max=0.75 | 20.88% | Behavioral outcome |
| profanity.comments | numeric | Mean profanity score (comments) | min=0.009, median=0.015, mean=0.029, max=0.45 | 20.88% | Behavioral outcome |
| politeness.comments | numeric | Mean politeness (comments) | min=-0.59, median=-0.018, mean~0, max=1.26 | 20.81% | Behavioral outcome |
| vader.comments | numeric | Mean VADER sentiment (comments) | min=-0.9, median=0.18, mean=0.18, max=0.89 | 20.81% | Behavioral outcome |
| made.comment | numeric | Made any comment | 0/1 (mean=0.79) | 0% | Behavioral outcome |
| n.comments.civil | numeric | Comments on civil posts | min=0, median=0, mean=1.0, max=13 | 0% | Behavioral outcome |
| n.comments.uncivil | numeric | Comments on uncivil posts | min=0, median=0, mean=0.98, max=18 | 0% | Behavioral outcome |
| n.comments.filler | numeric | Comments on filler posts | min=0, median=0, mean=0.25, max=5 | 0% | Behavioral outcome |
| n.comments.own | numeric | Comments on own posts | min=0, median=0, mean=0.64, max=15 | 0% | Behavioral outcome |
| n.comments.bonus | numeric | Comments on bonus posts | min=0, median=0, mean=0.16, max=5 | 0% | Behavioral outcome |
| commented.civil | numeric | Commented on civil post | 0/1 | 0% | Binary outcome |
| commented.uncivil | numeric | Commented on uncivil post | 0/1 | 0% | Binary outcome |
| commented.bonus | numeric | Commented on bonus post | 0/1 | 0% | Binary outcome |
| commented.filler | numeric | Commented on filler post | 0/1 | 0% | Binary outcome |
| commented.own | numeric | Commented on own post | 0/1 | 0% | Binary outcome |
| n.comments.majority | numeric | Comments on majority-condition posts (posts matching participant's assigned condition) | min=0, median=1, mean=1.8, max=18 | 0% | Derived; corresponds to in-condition posts |
| n.comments.minority | numeric | Comments on minority-condition posts (posts not matching participant's assigned condition) | min=0, median=0, mean=0.20, max=4 | 0% | Derived; corresponds to out-of-condition posts |
| n.comments.political | numeric | Comments on political posts | min=0, median=1, mean=2.0, max=18 | 0% | Behavioral outcome |
| n.comments.nonpolitical | numeric | Comments on non-political posts | min=0, median=0, mean=0.40, max=6 | 0% | Behavioral outcome |
| commented.majority | numeric | Commented on majority-type post | 0/1 | 0% | Binary outcome |
| commented.minority | numeric | Commented on minority-type post | 0/1 | 0% | Binary outcome |
| commented.political | numeric | Commented on political post | 0/1 | 0% | Binary outcome |
| commented.nonpolitical | numeric | Commented on non-political | 0/1 | 0% | Binary outcome |
| n.comments.dem | numeric | Comments on Dem posts | min=0, median=1, mean=0.98, max=15 | 0% | Behavioral outcome |
| n.comments.rep | numeric | Comments on Rep posts | min=0, median=0, mean=1.0, max=12 | 0% | Behavioral outcome |
| commented.dem | numeric | Commented on Dem post | 0/1 | 0% | Binary outcome |
| commented.rep | numeric | Commented on Rep post | 0/1 | 0% | Binary outcome |
| n.comments.inparty | numeric | Comments on in-party posts | min=0, median=0, mean=0.94, max=12 | 9.86% | Missing = Independents |
| n.comments.outparty | numeric | Comments on out-party posts | min=0, median=0, mean=1.09, max=15 | 9.86% | Missing = Independents |
| commented.inparty | numeric | Commented on in-party post | 0/1 | 9.86% | Missing = Independents |
| commented.outparty | numeric | Commented on out-party post | 0/1 | 9.86% | Missing = Independents |
| n.posts | numeric | Total posts made | min=0, median=1, mean=0.66, max=6 | 0% | Behavioral outcome |
| mean.chars.posts | numeric | Mean chars per post | min=0, median=18, mean=62.0, max=897 | 0% | Derived |
| mean.tokens.posts | numeric | Mean tokens per post | min=0, median=4, mean=11.6, max=172 | 0% | Derived |
| sum.chars.posts | numeric | Total chars in posts | min=0, median=21, mean=69.2, max=897 | 0% | Derived |
| sum.tokens.posts | numeric | Total tokens in posts | min=0, median=4, mean=12.9, max=172 | 0% | Derived |
| tox.posts | numeric | Mean toxicity of posts | min=0.005, median=0.025, mean=0.068, max=0.79 | 45.17% | Behavioral outcome |
| identity.attack.posts | numeric | Mean identity attack (posts) | min=0.001, median=0.004, mean=0.012, max=0.51 | 45.17% | Behavioral outcome |
| insult.posts | numeric | Mean insult (posts) | min=0.006, median=0.011, mean=0.040, max=0.79 | 45.17% | Behavioral outcome |
| profanity.posts | numeric | Mean profanity (posts) | min=0.009, median=0.015, mean=0.034, max=0.72 | 45.17% | Behavioral outcome |
| politeness.posts | numeric | Mean politeness (posts) | min=-0.88, median=-0.047, mean~0, max=1.34 | 44.90% | Behavioral outcome |
| vader.posts | numeric | Mean VADER (posts) | min=-0.945, median=0.103, mean=0.16, max=0.97 | 44.90% | Behavioral outcome |
| made.post | numeric | Made any post | 0/1 (mean=0.55) | 0% | Behavioral outcome |
| n.bot.comments | numeric | Bot comments received | min=0, median=12, mean=13.4, max=31 | 0% |  |
| n.content | numeric | Total user content | min=0, median=3, mean=3.7, max=24 | 0% | posts + comments |

---

## posts

**Reference Data** | N = 28 rows | 4 columns

Researcher-generated posts that populated the civil and uncivil newsfeeds on Spark Social. This file serves as a reference catalog of the treatment stimuli; it links to `all-comments` via `postId`.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| post | character | Text content of the researcher-generated post | 28 unique posts | 0% | Treatment stimulus text |
| post.type | character | Civility type of post | Civil, Uncivil, Filler, Bonus | 0% | Stimulus categorization |
| party | character | Party affiliation of the post author bot | Democratic, Republican, None | 0% | Stimulus attribute; None for Filler and Bonus posts |
| postId | character | Unique post identifier | 28 unique IDs | 0% | Identifier; links to `postId` in `all-comments` and `all-posts` |

---

## prolific-prepped

**Reference Data** | N = 1,520 rows | 2 columns

Anonymized Prolific participant data for payment verification.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| anon_id | character | Anonymized Prolific ID | 1,520 unique hashed IDs | 0% | Identifier; links to intake survey |
| completion.code | character | Study completion status | "SUBMITTED CODE" (all) | 0% | All submitted completion code |

---

## TextMeasures-Posts

**Pretest Data** | N = 224 rows | 11 columns

NLP measures for researcher-generated posts used in pretesting.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| condition | character | Newsfeed condition | Less Civil (112), More Civil (112) | 0% | Stimulus assignment |
| content.type | character | Post civility type | Civil (96), Uncivil (96), Filler (32) | 0% | Stimulus attribute |
| party | character | Post author party | Democratic (96), Republican (96) | 14.29% | Missing = Filler posts |
| bot.name | character | Bot account name | 21 unique names | 0% | Stimulus profile |
| bot | character | Bot identifier | 10 unique bots | 14.29% | Missing = Filler |
| picture | character | Post image (if any) | cheeseburger, Electric car, meme | 78.57% | Only some posts have images |
| relativeTime | numeric | Scheduled entry time of post relative to newsfeed start | min=-60, median=-18.5, mean=-20.2, max=10 | 0% | Minutes relative to participant's newsfeed start; negative = post pre-loaded before participant starts, positive = post enters feed after participant starts |
| text | character | Post text content | 26 unique posts | 0% | Stimulus text |
| method | character | NLP measurement method | 8 methods (toxicity, sentiment, etc.) | 0% | Long format by method |
| measure | numeric | NLP score value | min=-9, median=0.2, mean=0.11, max=6 | 0% | Pretest outcome |
| group | character | Measure category | Sentiment (168), Civility (28), Toxicity (28) | 0% | Measure grouping |

---

## TextMeasures-Comments

**Pretest Data** | N = 192 rows | 8 columns

NLP measures for researcher-generated comments used in pretesting.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| condition | character | Newsfeed condition | Less Civil (96), More Civil (96) | 0% | Stimulus assignment |
| content.type | character | Comment civility type | Civil (96), Uncivil (96) | 0% | Stimulus attribute |
| party | character | Commenter party | Republican (104), Democratic (88) | 0% | Stimulus attribute |
| bot | character | Bot account name | 20 unique names | 0% | Stimulus profile |
| text | character | Comment text content | 24 unique comments | 0% | Stimulus text |
| method | character | NLP measurement method | 8 methods | 0% | Long format by method |
| measure | numeric | NLP score value | min=-7, median=0.2, mean=0.06, max=6 | 0.52% | Pretest outcome |
| group | character | Measure category | Sentiment (144), Civility (24), Toxicity (24) | 0% | Measure grouping |

---

## TextMeasures-GPT-Comments

**Pretest Data** | N = 3,456 rows | 15 columns

NLP measures for GPT-generated bot comments used in pretesting.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| poster.party | character | Original poster's party | Democratic (1728), Republican (1728) | 0% | Post attribute |
| post.type | character | Post civility type | Civil (1728), Uncivil (1728) | 0% | Post attribute |
| post | character | Post text being replied to | 24 unique posts | 0% | Stimulus context |
| poster | character | Post author bot name | 10 unique bots | 0% | Post attribute |
| bot.civil | character | Civil bot persona prompt | 10 unique prompts | 0% | GPT prompt (civil version) |
| bot.uncivil | character | Uncivil bot persona prompt | 10 unique prompts | 0% | GPT prompt (uncivil version) |
| bot.party | character | Commenting bot's party | Democratic (1728), Republican (1728) | 0% | Bot attribute |
| bot | character | Commenting bot name | 10 unique bots | 0% | Bot identifier |
| prompt.civil | character | Full GPT prompt (civil) | Complex prompts with persona | 0% | Generation prompt |
| prompt.uncivil | character | Full GPT prompt (uncivil) | Complex prompts with persona | 0% | Generation prompt |
| content.type | character | Response civility type | response.civil, response.uncivil | 0% | Stimulus attribute |
| text | character | GPT-generated comment text | 432+ unique responses | 0% | Generated stimulus |
| method | character | NLP measurement method | 8 methods | 0% | Long format by method |
| measure | numeric | NLP score value | min=-8, median=0.3, mean=0.51, max=9 | 0.55% | Pretest outcome |
| group | character | Measure category | Sentiment, Civility, Toxicity | 0% | Measure grouping |

---

## ConnectMeasures-Posts

**Pretest Data** | N = 596 rows | 4 columns

Human civility ratings of researcher-generated posts from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| text.type | character | Post civility type | Uncivil (299), Civil (297) | 0% | Stimulus attribute |
| text.party | character | Post author party | Democrat (301), Republican (295) | 0% | Stimulus attribute |
| text | character | Post text content | 24 unique posts | 0% | Stimulus text |
| rating | numeric | Civility rating | min=1, median=4, mean=4.36, max=7 | 0% | Human rating; 7-point scale |

---

## ConnectMeasures-Comments

**Pretest Data** | N = 596 rows | 5 columns

Human civility ratings of researcher-generated comments from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| text.type | character | Comment civility type | Uncivil (302), Civil (294) | 0% | Stimulus attribute |
| text.party | character | Commenter party | Democrat (303), Republican (293) | 0% | Stimulus attribute |
| text | character | Comment text content | 24 unique comments | 0% | Stimulus text |
| parent.post | character | Parent post text | 12 unique parent posts | 0% | Context for comment |
| rating | numeric | Civility rating | min=1, median=5, mean=4.49, max=7 | 0% | Human rating; 7-point scale |

---

## ConnectMeasures-GPT-Comments

**Pretest Data** | N = 596 rows | 9 columns

Human civility ratings of GPT-generated comments from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| poster.party | character | Original poster's party | Democratic (299), Republican (297) | 0% | Post attribute |
| post.type | character | Post civility type | Civil (304), Uncivil (292) | 0% | Post attribute |
| post | character | Post text being replied to | 24 unique posts | 0% | Context |
| bot.prompt | character | Bot persona prompt | 24 unique prompts | 0% | Generation context |
| bot.type | character | Bot civility type | Civil (303), Uncivil (293) | 0% | Stimulus attribute |
| bot.party | character | Commenting bot's party | Republican (301), Democratic (295) | 0% | Bot attribute |
| bot | character | Bot identifier | 12 unique bots | 0% | Bot name |
| text | character | GPT-generated comment text | 346+ unique responses | 0% | Generated stimulus |
| rating | numeric | Civility rating | min=1, median=5.5, mean=4.92, max=7 | 0% | Human rating; 7-point scale |

---

## ConnectMeasures-Posts-Pairs

**Pretest Data** | N = 447 rows | 6 columns

Paired comparison data for researcher-generated posts from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| party | character | Post author party | Democrat (224), Republican (223) | 0% | Stimulus attribute |
| text1 | character | First post in pair | 12 unique posts | 0% | One member of pair |
| text2 | character | Second post in pair | 12 unique posts | 0% | Other member of pair |
| text1.type | character | First post civility | Civil (233), Uncivil (214) | 0% | Stimulus attribute |
| text2.type | character | Second post civility | Uncivil (233), Civil (214) | 0% | Stimulus attribute |
| correct3 | character | Correctly identified uncivil | Correct (424), Incorrect (23) | 0% | Validation outcome |

---

## ConnectMeasures-Comments-Pairs

**Pretest Data** | N = 447 rows | 8 columns

Paired comparison data for researcher-generated comments from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| post.type | character | Parent post civility | Civil (331), Uncivil (116) | 0% | Context attribute |
| parent.post | character | Parent post text | 8 unique posts | 0% | Context |
| text.party | character | Commenter party | Democrat (230), Republican (217) | 0% | Stimulus attribute |
| text1 | character | First comment in pair | 12 unique comments | 0% | One member of pair |
| text2 | character | Second comment in pair | 12 unique comments | 0% | Other member of pair |
| text1.type | character | First comment civility | Civil (307), Uncivil (140) | 0% | Stimulus attribute |
| text2.type | character | Second comment civility | Uncivil (307), Civil (140) | 0% | Stimulus attribute |
| correct3 | character | Correctly identified uncivil | Correct (419), Incorrect (28) | 0% | Validation outcome |

---

## ConnectMeasures-GPT-Comments-Pairs

**Pretest Data** | N = 447 rows | 11 columns

Paired comparison data for GPT-generated comments from Connect platform.

### Variable Dictionary

| variable | type | description | allowed values / range | missing (%) | notes |
|:---------|:-----|:------------|:-----------------------|:------------|:------|
| bot.civil | character | Civil bot persona prompt | 12 unique prompts | 0% | Generation context |
| bot.uncivil | character | Uncivil bot persona prompt | 12 unique prompts | 0% | Generation context |
| bot.party | character | Commenting bot's party | Republican (234), Democratic (213) | 0% | Bot attribute |
| bot | character | Bot identifier | 12 unique bots | 0% | Bot name |
| poster.party | character | Original poster's party | Republican (231), Democratic (216) | 0% | Post attribute |
| post | character | Post text being replied to | 24 unique posts | 0% | Context |
| text1 | character | First GPT comment in pair | 212+ unique | 0% | One member of pair |
| text2 | character | Second GPT comment in pair | 212+ unique | 0% | Other member of pair |
| text1.type | character | First comment civility | Civil (244), Uncivil (203) | 0% | Stimulus attribute |
| text2.type | character | Second comment civility | Uncivil (244), Civil (203) | 0% | Stimulus attribute |
| correct3 | character | Correctly identified uncivil | Correct (413), Incorrect (34) | 0% | Validation outcome |
